CT measurement of prostate volume using OsiriX® viewer is reliable, repeatable, and not dependent on observer, CT protocol, or contrast enhancement in dogs

Abstract Computed tomography (CT) is an established method for evaluating dogs with suspected prostatic disease; however, publications assessing the effects of varying factors on prostate volume measurements are lacking. The objectives of this two‐part, observer agreement, methods comparison study were to assess observer agreement and the effects of varying CT technical parameters for volume measurements of canine prostate glands on CT images using OsiriX® DICOM viewer software. In the first retrospective study, two observers measured prostate volumes of 13 client‐owned dogs thrice on noncontrast and contrast CT images. In the second prospective study, two observers measured the prostate volume of 10 cadavers using five different CT protocols and eight cadavers using three slice thicknesses. Observer agreement analyses were performed, and prostatic CT volume measurements were compared with water displacement volume measurements. Intra‐ and interobserver variability and the effect of contrast enhancement were found to be minimal when a one‐way analysis of variance model and intraclass correlation coefficients were used. No significant differences emerged between different protocols and slice thicknesses using a linear mixed effects model. When the prostate CT volume was compared using a Bland–Altman plot with the reference volume acquired by the water displacement method, agreement without consistent bias between the methods was shown, and over 90% of measurements were located within the 95% limits of agreement. The findings supported using OsiriX® software for CT prostatic volume measurements in dogs.


INTRODUCTION
Prostate diseases are common among intact male dogs and often lead to prostate gland enlargement. [1][2][3][4][5] Traditionally, the size of the prostate gland has been evaluated using subjective assessments such as rectal palpation or radiographic or ultrasonographic imaging. [4][5][6][7][8] Recently, computed tomography (CT) has become widely available in small animal practice. Several studies on dogs have shown CT imaging to be a reliable tool for investigating prostate gland structure and size. [9][10][11][12][13] Prostate gland boundaries, lesions, location, and surrounding tissues are more visible in CT than with ultrasonographic imaging, allowing more exact measurements to be taken. [9][10][11][12][13] Earlier, one-dimensional parameters, such as length, width, and height, have been measured; later, prostate volume has been calculated by advanced software. [9][10][11][12][13] Accurate volume measurements allow development of reference value ranges and serve as an objective tool for characterizing severity of prostatomegaly.
Computed tomography volumes of objects and organs in both dogs and humans have been compared with reference volumes, with actual volumes determined by the water displacement method 12,14-15 or known water content, 12,15,16 or with resection weights. 17 Computed tomography volume measurements have recently reached an accuracy of ±0.8%, 12 compared with ±5% 14 to ±10% 15 in the early 1980s. The number of different DICOM viewer software programs available for assessing CT images is extensive, and currently, many software programs have specific tools for CT volume measurements. Many of these have been used in volume measurement studies of multiple organs with or without reference methods. For example, Amira ® software has been used to measure the CT volume of the canine 12 (version 6.2, Hillsboro, Oregon, USA) and human prostate 20 (version 3.1, Berlin, Germany), and OsiriX ® (Geneva, Switzerland) software has been used for measurements of the human liver 17,18 and orbita. 19 Based on our review of previous literature, no published studies described measuring canine prostate volume using OsiriX ® and compared it with a reference standard method. No studies were found in which intra-or interobserver variability of canine prostate CT volume measurement had been analyzed. Although technical parameters of CT scans might have an impact on volume measurements, no published studies have assessed the impact of tube current (mA), tube potential (kV), imaging algorithm, window level, or contrast enhancement on volume measurements. Two previous studies evaluated the effect of slice thickness and found that smaller slice thicknesses gave more accurate results. 20,21 While parameters can be kept constant in prospective studies, this is often not possible in multicenter and retrospective studies.
The objectives of our two-part study were to evaluate the intra-and interobserver variability using two observers with different experience levels as well as the effect of different CT protocols and use of contrast agent on volume measurements of prostates of intact male dogs in CT using the OsiriX ® volume measurement tool. Additionally, we aimed to compare the CT volumes of the prostates with actual reference volumes. Our hypothesis was that CT volume measurements using the OsiriX ® volume measurement tool would be reliable, repeatable, and not influenced by observer, CT protocol, or contrast medium.

MATERIALS AND METHODS
This was a single-center, two-part, observer agreement and method comparison study performed at the Veterinary Teaching Hospital of the University of Helsinki (VTHUH).

Sample population
In the first retrospective study, the patient data were collected from the VTHUH database. Client-owned intact male dogs aged 5 years or older that had undergone a CT scan of the caudal abdomen from August 2016 to December 2020 were included in the study (Figure 1).
Only scans including the entire prostate in both noncontrast and contrast images were considered. Dogs with known prostatic disease, hormonal implants, or treatment were excluded.
In the second prospective study, CT scans of the caudal abdomen, including the entire prostate, were performed on client-owned intact male dogs that had been euthanized at VTHUH during the period from May 2020 to March 2021 ( Figure 1). As in the retrospective study, dogs with known prostatic disease, hormonal implants, or treatment were excluded. Before anonymization, decisions for including and excluding dogs for both studies were made by one of the observers (H.M.S.) with 12 years of experience in veterinary diagnostic imaging.

Ethical considerations
In the retrospective study, based on Finnish national legislation, the need for ethical approval or owners' consent was deemed unnecessary since data were anonymized before analysis (https://finlex.fi/en/laki/ kaannokset/2013/20130497). In the prospective study, owners gave consent according to the VTHUH policy for the use of cadavers for research and teaching purposes.

CT volume measurements
All measurements were performed by two observers, observer #1

Water displacement volume measurements
In the prospective study, the prostates were carefully resected from the cadavers immediately after CT scanning, and their volumes were measured thrice by observer #1 (Figure 1) using a previously described water displacement method. 8,12,14,15 The water displacement method consisted of placing the prostate in a graduated cylinder containing a recorded volume of water, after which the displaced volume of water was recorded. The size of the cylinder (with 1-5 cm 3 steps) was optimized for the size of the prostate.

Statistical methods
Statistical tests were selected and completed by a statistician. All statistical analyses were performed using commercially available soft- where the effect of dog was used as a fixed effect. In these models, the within-group variation described the variation between the repeats. The presented values were considered a percentage of perfect agreement. Second, to determine the intraobserver reliability estimate between the repeats, intraclass correlation coefficients (ICCs) with 95% confidence intervals (CIs) were calculated to assess consistency between the repeats. ICC was calculated both within observers and for combined data, including both observers (within noncontrast or contrast groups), and finally for the full data, including both observers and image series. ICC values of 0.01-0.2 were considered to have "slight agreement," 0.21-0.40 "fair agreement," 0.41-0.60 "moderate agreement," 0.61-0.80 "substantial agreement," and 0.80-1.00 "almost perfect agreement." 22 The random variation between observers (interobserver reliability) was evaluated with the average values of each observer using a similar methodology as described for the intraobserver repeatability.
In the prospective study, statistical analyses were performed between volume measurements from images obtained using five different protocols, between three different slice thicknesses, and between two different methods, i.e., CT volumetry and water displacement method. The different protocols to measure the prostate volume were compared with each other using a linear mixed effects model. The model included the protocol as the sole fixed effect and observer and dog as the random effects. Compound symmetry was used as the covariance structure. The agreement of the two observers, regardless of the protocol, was assessed with a scatter plot and by evaluating the regression equation based on a simple linear regression analysis between the two observers, where zero as the intercept and one as the slope would mean identical results between the observers. In addition, the five protocols were compared pairwise against the water displacement method using Bland-Altman plots. Based on the observed average difference, 95% limits of agreement (LoA) were constructed.
The three slice thicknesses were compared with a similar linear mixed effect model as the measurement protocols using slice thickness as the fixed effect and observer and dog as the random effects.

Sample population and image acquisition
In the retrospective study, 13 dogs met the inclusion criteria. The average ± SD (range) age and body weight of the dogs were 8.6 ± 2.7 In the prospective study, 10 male intact cadavers were collected.
The average ± SD (range) age and body weight were 9.  Abbreviation: CI, confidence interval a Intraclass correlation coefficients using an absolute agreement definition.

Volume
In the retrospective study, the average ± SD (range) CT volume of all measurements of the 13 prostate glands was 34.6 ± 30.6 (1.

Intraobserver evaluation
According to the one-way ANOVA model, repeatability was 99.91-99.98% of perfect agreement between repeated volume measurements of the 13 prostates in all four groups within observers and within noncontrast and contrast groups. Moreover, intraobserver ICC values showed almost perfect agreement between repeats in all groups (Table 2). Almost perfect agreement remained after combining the groups, first both observers together and then the noncontrast and contrast groups together (Table 3).

Interobserver evaluation
According to the one-way ANOVA model, the repeatability was 99.9% and 100.0% of perfect agreement between the average values of each observer's measurements in both the noncontrast and contrast groups,

Evaluation between different CT protocols and slice thicknesses
According to a linear mixed effects model, no significant differences emerged between volume measurements from images acquired by five different CT protocols. In Figure 3, where all CT protocols were included, a simple linear regression analysis showed almost perfect agreement between the two observers. When these volumes were compared pairwise against the water displacement method using a Bland-Altman plot, there was no consistent bias of one method versus the other. Over 90% of the measurements were situated within the 95% LoA, with upper and lower limits of ± 2.5 cm 3 (Figure 4).

DISCUSSION
In our study, intra-and interobserver analysis and evaluation between different CT protocols and slice thicknesses supported our hypothesis,

F I G U R E 4
Bland-Altman diagram to demonstrate the difference between CT volume and water displacement method volume against the average of these two measurements. Markers o and + are used for the measurement of observers #1 and #2, respectively, and each of five CT protocols are represented with identical markers. A solid horizontal line with a y-axis intercept of 0 represents perfect agreement, and the average bias is located at the same level. The 95% LoAs are also displayed as solid horizontal lines. The y-axis shows the difference between the two paired measurements, and the x-axis represents the average of these measurements which assumed that measuring prostate CT volume using the OsiriX ® software tool is reliable, repeatable, and not influenced by observer, CT protocol, or contrast enhancement. Additionally, when the prostate CT volume was compared with the reference volume acquired by the water displacement method, agreement without consistent bias between the methods was shown. Intraobserver variation in prostate CT volume measurement has not previously been evaluated in either veterinary or human studies. However, intraobserver variability in orbital CT volume has been assessed in human studies. In one study, the authors found that the average difference between repeated measurements was smaller than 5%. 23 In another study, highly reproducible results were reported when the same images of the orbit were re-evaluated by the same observer two weeks after the initial assessment. 19 In our study, intraobserver variability in CT volume measurements was minimal. Based on our results, one measurement is sufficient for evaluating prostate CT volume.
Additionally, interobserver variability was minimal. Earlier studies of interobserver variability in CT volume measurements have mostly been in human medicine using several different software. [16][17][18] When two observers independently measured the volumes of eight abdominal organs on the same patient's CT scan, average differences were less than 5% for all organs, except adrenal glands. Researchers suggested that small measurement errors in volumes were amplified for smaller organs (e.g., right adrenal 6% and left adrenal 21%). 16 Interobserver variability was also evaluated by assessing human liver CT volumes. 17,18 In one study, two trained medical students and a specialized liver radiologist measured the liver volumes of 25 patients using two different software programs. 17 In another study, two newly trained and two experienced observers measured the liver CT volumes of 30 patients with two software programs. 18 In these studies, interobserver variability was small within and between software, and the level of experience did not significantly affect the results. 17,18 Similarly, a study assessing interobserver variability of orbital volumes measured by two surgeons demonstrated a high level of accuracy. 19 These findings are consistent with our results of minimal interobserver variability. In addition, a clinician familiarized with the procedure and local anatomy is capable of taking accurate measurements, which are comparable to measurements made by an experienced veterinary radiologist.
Based on our review of the literature, no previously published studies were found comparing volumes measured on images acquired using different CT protocols or on both noncontrast and contrast images of the same scan. Typically, the measurements have been done on contrast image series. 12,[16][17][18]21 We found that volume measurements of the prostate do not differ between images acquired using different protocols or between noncontrast and contrast images. Based on these results, noncontrast images or images acquired using different protocols can be used and are comparable to each other for clinical or research purposes.
In veterinary medicine, the effect of slice thickness has not been previously reported based on our review of the literature. We found that a slice thickness varying between 0.625 and 2.5 mm did not significantly affect CT volume measurements, which is consistent with previous literature in human medicine. 20,21 Studies assessing the effect of slice thickness using human prostate or liver volumes report that accuracy decreases when slice thicknesses of over 5 mm were used. In addition, smaller objects were significantly more affected by slice thickness than larger ones. 20,21 Volume measurement using small slice thickness is quite time-consuming and would therefore not be practical in routine clinical use. In our study, we did not routinely record the required time for CT measurements. However, a medium-sized prostate with a volume of 50 cm 3  In the prospective part, there was no consistent bias of the CT volume versus water displacement method, and over 90% of measurements were situated within the 95% LoA ( Figure 4). However, some outliers were detected. The outliers were probably due to the accuracy of the graduated cylinder. When large prostates were measured, a graduated cylinder with a grading division of 5 cm 3 had to be used. We assume that this led to lower measurement accuracy in the water displacement method and caused larger differences when compared with CT volumes. Only a few previous studies have compared CT volumes with actual reference volumes. 12 24,25 and different window levels used when comparing measurements between two software programs. 17 When comparing CT volumes of the canine prostate to actual volumes, overestimation has not been found, 12 which is consistent with our results. Changes in fluid content are possible during both resection and sinking of the prepa-ration in the water during measurement with the water displacement method. This was not noticed in our study. The effect of varying the window level causing potential overestimation of the measured volume was excluded in our study by standardizing the window level.
Difficulties in defining prostate boundaries and contouring the prostate on CT images have been reported in human medicine. Examples of these are the tendency to include portions of neurovascular bundles, poor definition of the interface between the posterior prostate edge and the anterior rectal wall, and difficulties in distinguishing the lower limit of the prostate apical region because of its close proximity to the pelvic floor muscles and the poor contrast between these two soft tissues. 27 In our material, we did not have similar difficulties. However, we excluded dogs with known prostatic disease, which was presumably why we did not have severely altered prostates in our sample. The ability to contour prostate boundaries might be different with enlarged and irregularly shaped prostates that protrude close to adjacent soft tissues. The small and regularly shaped prostates in our study were surrounded by fat tissue, which gave good contrast to the prostate tissue and helped to define the prostate boundary even without contrast medium.
Algorithms of the volume measurement tools of different software are not generally open to users, making direct comparisons difficult.
In our study, no concerns were brought forth that our findings could not also be generalized to other software, which provide a similar technique for volume measurement. The advantages of OsiriX ® include that it is not bounded to the workstation of the CT scanner, and it is easily available for users. A potential obstacle for users who want to use OsiriX ® is that it is only compatible with the Mac ® operating system. This study has several limitations. First, because we excluded dogs with known prostatic disease, the results cannot be generalized to severely abnormal and enlarged prostates. Second, the number of dogs and cadavers was small. Third, the graduated cylinder with a grading division of 1-5 cm 3 enables a maximum measurement accuracy of one decimal, while in CT measurements, four decimals were calculated. Thus, the water displacement method provides only a coarse reference to which the CT volume can be compared. Fourth, comparisons between different protocols, slice thicknesses, and CT volume and water displacement methods were performed only with cadavers.
Finally, all protocol parameters and conditions could not be standardized in the retrospective study, e.g., the acquisition time of contrast images varied.
In conclusion, measuring the prostate volume on CT images is a repeatable and reliable method. Volume measurements have sufficient accuracy for clinical and research purposes regardless of which CT protocol is used in the acquisition of images. Additionally, measuring the volume from noncontrast images is as repeatable and reliable as measuring the volume from contrast images. The reliability of the measurements is not dependent on the availability of a specialist; a clinician familiar with the technique is also capable of measuring the prostate. Further studies are needed to evaluate the reliability of volume measurements of severely abnormal or enlarged prostates and to determine reference values for prostate size on CT.